Learn Apache mod_rewrite: 13 Real-world Examples

This article was written in 2007 and remains one of our most popular posts. If you’re keen to learn more about Apache, you may find this recent article on Apache CloudStack of great interest.

Apache’s low-cost, powerful set of features make it the server of choice for organizations around the world. One of its most valuable treasures is the mod_rewrite module, the purpose of which is to rewrite a visitor’s request URI in the manner specified by a set of rules.

This article will lead you through rewrite rules, regular expressions, and rewrite conditions, and provide a great list of examples.

First off, I’m going to assume that you understand the common reasons for wanting a URI rewriting feature for your web site. If you’d like information about this field, there’s a good primer in the SitePoint article, mod_rewrite: A Beginner’s Guide to URL Rewriting. There, you’ll also find instructions on how to enable it on your own server.

Testing Your Server Setup

Some hosts do not have mod_rewrite enabled (by default it is not enabled). You can find out if your server has mod_rewrite enabled by creating a PHP script with one simple line of PHP code:

phpinfo();

If you load the script with a browser, look in the Apache Modules section. If mod_rewrite isn’t listed there, you’ll have to ask your host to enable it — or find a "good host". Most hosts will have it enabled, so you’ll be good to go.

The Magic of mod_rewrite

Here’s a simple example for you: create three text files named test.html, test.php, and .htaccess.

In the test.html file, enter the following:

<h1>This is the HTML file.</h1>

In the test.php file, add this:

<h1>This is the PHP file.</h1>

Create the third file, .htaccess, with the following:

RewriteEngine on 
RewriteRule ^/?test.html$ test.php [L]

Upload all three files (in ASCII mode) to a directory on your server, and type:

http://www.example.com/path/to/test.html

into the location box — using your own domain and directory path of course! If the page shows "This is the PHP file", it’s working properly! If it shows "This is the HTML file," something’s gone wrong.

If your test worked, you’ll notice that the test.html URI has remained in the browser’s location box, yet we’ve seen the contents of the test.php file. You’ve just witnessed the magic of mod_rewrite!

mod-rewrite Regular Expressions

Now we can begin rewriting your URIs! Let’s imagine we have a web site that displays city information. The city is selected via the URI like this:

http://www.example.com/display.php?country=USA&state=California&city=San_Diego

Our problem is that this is way too long an unfriendly to users. We’d much prefer it if visitors could use:

http://www.example.com/USA/California/San_Diego

We need to be able to tell Apache to rewrite the latter URI into the former. In order for the display.php script to read and parse the query string, we’ll need to use regular expressions to tell mod_rewrite how to match the two URIs. If you’re not familiar with regular expressions (regex), many sites provide excellent tutorials. At the end of this article, I’ve listed the best pages I’ve found on the topic. If you’re not able to follow my explanations, I recommend reviewing the first two of those links.

A very common approach is to use the expression (.*). This expression combines two metacharacters: the dot character, which means ANY character, and the asterisk character, which specifies zero or more of the preceding character. Thus, (.*) matches everything in the {REQUEST_URI} string. {REQUEST_URI} is that part of the URI which follows the domain up to but not including the ? character of a query string, and is the only Apache variable that a rewrite rule attempts to match.

Wrapping the expression in brackets stores it in an "atom," which is a variable that allows the matched characters to be reused within the rule. Thus, the expression above would store USA/California/San_Diego in the atom. To solve our problem, we’d need three of these atoms, separated by the subdirectory slashes (/), so the regex would become:

(.*)/(.*)/(.*)

Given the above expression, the regex engine will match (and save) three values separated by two slashes anywhere in the {REQUEST_URI} string. To solve our specific problem, though, we’ll need to restrict this somewhat — after all, the first and last atoms above could match anything!

To begin with, we can add the start and end anchor characters. The ^ character matches matching characters at the start of a string, and the $ character matches characters at the end of a string.

^(.*)/(.*)/(.*)$

This expression specifies that the whole string must be matched by our regex; there cannot be anything else before or after it.

However, this approach still allows too many matches. We’re storing our matches as atoms, and will be passing them to a query string, so we have to be able to trust what we match. Matching anything with (.*) is too much of a potential security hazard, and, when used inappropriately, could even cause mod_rewrite to get stuck in a loop!

To avoid unnecessary problems, let’s change the atoms to specify precisely the characters that we will allow. Because the atoms represent location names, we should limit the matched characters to upper and lowercase letters from A to Z, and because we use it to represent spaces in the name, the underscore character (_) should also be allowed. We specify a set using square brackets, and a range using the - character. So the set of allowed characters is written as [a-zA-Z_]. And because we want to avoid matching blank names, we add the + metacharacter, which specifies a match only on one or more of the preceding character. Thus, our regex is now:

^([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$

The {REQUEST_URI} string starts with a / character. Apache changed regex engines when it changed versions, so Apache version 1 requires the leading slash while Apache 2 forbids it! We can satisfy both versions by making the leading slash optional with the expression ^/? (? is the metacharacter for zero or one of the preceding character). So now we have:

^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$

With regex in hand, we can now map the atoms to the query string:

display.php?country=$1&state=$2&city=$3
$1 is the first (country) atom, $2 is the second (state) atom and $3 is the third (city) atom. Note that there can only be nine atoms created, in the order in which the opening brackets appear -- $1 ... $9 in a regular expression.

We're almost there! Create a new .htaccess file with the text:

RewriteRule ^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$ display.php?country=$1&state=$2&city=$3 [L]

Save this to the directory in which display.php resides. The rewrite rule must go on one line with one space between the RewriteRule statement, the regex, and the redirection (and before any optional flags). We’ve used the [L], or ‘last’ flag, which is the terminating flag (more on flags later).

Our rewrite rule is now complete! The atom values are being extracted from the request string and added to the query string of our rewritten URI. The display.php script will likely extract these values from the query string and use them in a database query or something similar.

If, however, you have only a short list of allowable countries, it might be best to avoid potential database problems by specifying the acceptable values within the regex. Here’s an example:

^/?(USA|Canada|Mexico)/([a-zA-Z_]+)/([a-zA-Z_]+)$

If you’re concerned about capitalization because the values in your database are strictly lowercase, you can make the regex engine ignore the case by adding the No Case flag, [NC], after the rewritten URI. Just don’t forget to convert the values to lowercase in your script after you obtain the $_GET array.

If you want to use numbers (0, 1, … 9) for, say, Congressional Districts, then you’ll need to change an atom’s specification from ([a-zA-Z_]+) to ([0-9]) to match a single digit, ([0-9]{1,2}) to match one or two digits (0 through 99), or ([0-9]+) for one or more digits, which is useful for database IDs.

The RewriteCond Statement

Now that you’ve learned how to use mod_rewrite’s basic RewriteRule statement with the {REQUEST_URI} string, it’s time to see how we can use conditionals to access other variables with the RewriteCond statement. The RewriteCond statement is used to specify the conditions under which a RewriteRule statement should be applied.

RewriteCond is similar in format to RewriteRule in that you have the command name, RewriteCond, a variable to be matched, the regex, and flags. The logical OR flag, [OR], is a useful flag to remember because all RewriteCond and RewriteRule statements are inclusive, in the sense of a logical AND relationship, until terminated by the Last, [L], flag.

You can test many server variables with a RewriteCond statement. You can find a list in the SitePoint article I mentioned previously, but this is the best list of server variables I've found.

As an example, let's assume that we want to force the www in your domain name. To do this, you'll need to test the Apache {HTTP_HOST} variable to see if the www. is already there and, if it's not, redirect to the desired host name:

RewriteCond %{HTTP_HOST} !^www.example.com$ [NC] 
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

Here, to denote that {HTTP_HOST} is an Apache variable, we must prepend a % character to it. The regex begins with the ! character, which will cause the condition to be true if it doesn’t match the pattern. We also have to escape the dot character so that it matches a literal dot and not any character, as is the case with the dot metacharacter. We’ve also added the No Case flag to make this operation case-insensitive.

The RewriteRule will match zero or one of any character, and will redirect to http://www.example.com plus the original {REQUEST_URI} value. The R=301, or redirect, flag will cause Apache to issue a HTTP 301 response, which indicates that this is a permanent redirection; the Last flag tells mod_rewrite that you’ve completed this block statement.

RewriteCond statements can also create atoms, but these are denoted with %1 ... %9 in the same way that RewriteRule atoms are denoted with $1 ... $9. You’ll see these atom variables in operation in the examples later on.

Go to page: 1 | 2

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.